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ABSTRACT 



A method and apparatus are described for detecting double- 
tallc in an acoustic echo canceller. The present invention 
examines the ^ectral characteristic of the near-end audio 
signal and the spectral characteristics of the far-end audio 
signal and determines from the con^adson if a condition of 
doubletalk exists. An exen^lary implementalon of the 
present invention is presented in an acoustic echo canceUesr 
wherein the adaptation of the adaptive filter taps is inhibited 
during periods of doubletalk. 

11 Claims, 2 Drainng Sheets 
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DOUBLETTALK DETECTION BY MEANS OF cancdlexs, which can monitor the fairly constant loss 

SPECTRAL CONTENT between x(n) and r(n) to gain information about whethex 

near-«nd speech is present, acoustic echo cancellers do not 

This is a Continuation of application Ser. No. 08/535, have this property. Since the analog speato volume control 

365, filed Sep. 28, 1995, now abandoned, which is a 5 is under the control of the driver, the volume can be dianged 

continuation of application Ser. No. 08/202421, filed Feb. to any desired level at any time. The volume can even be 

28, 1994, now abandoned. shifted so high as to produce a gain betsveen speaker and 

BACKGROUNDOFTHBINVENnON |Mct^one. TTie miaophone position may also change at 

L Field of the Invention TYaditionally, doubletalk detection in acoustic environ- 
The present invention relates to echo cancellation. More ments is accomplished by monitoring the echo return loss 
particularly, the present invention relates to a novel and enhancement (ERLB), which is defined as: 
inqxroved method and apparatus for determining a double- 
talk condition in an echo canceller. ERmdB) = 10 • k>g(o,=Vc.='). (i) 

n. Description of the Related Art _ , i5 where a/ is the variance of the echo signal y(n) and a/ is 

Acoustic ecfao^anceUcTS (AEC) are used in teleconfer- variance of the error e(n). The variances a/ and 0/ are 

endng and hands-free telephony plications to eliminate estimated using short-term energy measurements of r(n) and 

acoustic feedback between a loudspeaker and a microphone. ^^qj respectively. The ERLE measures how much energy is 

In a cellular telephone system where the driver uses a being removed in summing element 12. Classical doubletalk 

hands-free telephone, acoustic echo cancellers are used in ^ detectors declare that near-end speech is present if the ERX-E 

the mobile station to provide full-duple\ communications. A falls below some preset threshold such as 3 or 6 dB. 

block diagram of a traditional acoustic echo canceller is This doubletalk detection me&od is highly unreliable, 

illustrated in FIG. 1. especially in high-noise environments, for several reasons. 

For reference purposes, the driver is the near-end talker this method requires the adaptive filter 14 to be 

with input speech signal v(n) and the person at the other end „ converged bef we the ERLB can jaovide any vaUd informa- 

of the connection is the far-end talker with input digital tion.Inanoisy environment like a car, adaptive filter 14 may 

speech signal x(n). The speech of the far^nd talker is not converge at aU, iiiay conveige extremely 

hr^toutofloudspeakcr2inth^ to the noise and the long mt^ len^ requ^^^ 

is picked up by miao^hone 10, the far^nd talker heSTan '^vt fi?^°U c^%^^ ZT^l 

anroying echo of his or h. own voic. ^e output of 30 ^"i'm f^S^SS^^t 

microphone 10 r(n), is a digital agnal. Topically the fonc- ^^^^ doubletalk detections. TTurd, a change 

tions performed by imaophone 10 may be acconiphshed by ^ ^f the echo path also produces a loss 

a microphone, which would convert the audio signal to an eRLE. If people are moving within the mobUe 

analog electrical signal and an analog to digital (A/D) environment, or the miaophone changes its position, the 

converter. The AEC identifies the impulse response between 33 erlE will drop, causing a false doubletalk detection, 

speaker 2 andmicrophone 10, generates arepUca of the edio qttmmary OF THE INVENTION 

using adaptive filter 14, and subtracts it in summer 12 from SUMMARY OF THE INVENTnON 

the micrcphone output, r(n), to cancel the far-end talker echo The present invention is a novd and improved method 

y(n), Since the adaptive filter cannot generally remove all of and apparatus for detecting doubletalk. This newly proposed 

the echo, some form of echo suppression provided by 4^ metfiod for doubletalk detection measures and coiiq)ares the 

residual echo suppression element 18 is typically enq)loycd spectral content of the far-end reference signal x(n) and the 

to remove any residual edio. received signal r(n). The unknown acoustic echo channel is 

T TTWj^ 1 * J * 11 1. 1 / \ • -It * J modeled as a linear time-invariant (LIT) systenL Although 

fcHG^l thefarenduOkerecho sig the unknown channel may in actuality viy with time, it 

as the ou^ut of an acoustic echo patii elenumt 4, which is an ^^^j ^^^^^ ^ J ^ ^^j^^ ^ 

artifact of tiie proxmuty of the loudspeaker 2 and micro- 45 track it, therefore peimitting useof tiiis model. A useful 

phone 10. To the far end talker echo signal y(n) is added lTI systems is that they do not create any new 

noise signal w(n) and near-end speech signal v(n), illustrated ftequendes. That is, if the input to an LTI system consists of 

by summing elements 6 and 8 respectively. It should be frequencies A, B, and C, the output of the system must 

noted that summing elements 6 and 8 and acoustic echo patii contain scaled rqnUcas of these 3 frequencies. No new 

4 are artifacts of tiie mobile environment and are presented 50 frequencies may be present at the output if tiie system is 

for iUustrative purposes. linear. 

Since adaptive filter 14 uses the far-end speech x(n) as a ^^^^ j^e Fourier transfonn, both the f ar^nd reference 

reference signal, it cannot possibly cancel tiie near-end signal x(n) and tiie received signal r(n) can be represented as 

speech because in general, v(n) is uncoirclatcd wife x(n). If a sum of con^l« exponentials. Since tiie received echo 

adaptive filter 14 is allowed to adapt in tiie presence of v(n), 55 signal at tiie microirfione sounds like tiie original far-end 

ttie near-end speech will be added to tiie oror signal e(n), slg^ai^ the frequency components tiiat are large in tiie 

which drives the filter tap coefficient adaptation, COTupting received signal must also have been large in the reference 

tiie estimate q£ acoustic echo patii 4. It is tiicrefore necessary signal. If tiicre arc large peaks in tiie received signal that are 

to disable coefficient adaptation when bofli talkers are notoresentintiiereferencerfgna!,then these peaks were not 

speaking, a condMoa refecfcd to as doublei^ During eo caus'ed by echa THerefore, by comparing tiie frequency 

doubletalk, residual edio suppression element 18 must also between tiie reference and received signal, it can be 

be disabled to [xevent corruption of the near-end speech. detennincd wh^her near-end specdi is present, even witii- 

Doublctalk detector 16 detects tiie presence of doubletalk knowledge of tiie unknown echo channel 

and provides control signals to adaptive filter 14 and residual 

echo suppression clement 18 when double talk is present 65 DESCRimON OF THE DRAWINGS 

Doubletalk detection is the most critical element in any Tlie features, objects, and advantages of the i:resent 

acoustic echo canceller. In contrast with network echo invention will become more apparent from tiie detailed 
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descriptioii set forth below when taken in conjunction with Periods of silence are detected by sOence detector 88 which 

the drawings in which like reference characters identify enables noise spectrum averaging element 90 dunng 

correspondingly throughout and wherein: detected periods of sttence. Noise spectrum avmgmg dc>- 

. /i , ^. r ^ J-*.- 1 „^ k« meat 90 provides averaged noise magmtude spectrum IN(fc)l 

FIG. 1 is a block diagram of a traditional acoustic echo t *™™. aj: ^^rZ^^^^A f»^c» maonitiiHp 

rivj. J. ia a vua^cuu ^ summcT 86. In summer 86, the averaged noise magmmoe 

canceller; spectrum IN(k)l is subtracted from the received magnitude 

FIG. 2 is a block diagram of the acoustic echo canceller spectrum IR(k)L The absolute value of the difference is 

of the present invention; and determined in magnitude element 84 to obtain the noise- 

FIG. 3 is a block diagram of the doublctalk detection suppressed received magnitude spectrum IS(k)l. 

apparatus of the present invention. The magnitude components of the far-end speech spec- 

trumlX(k)l are weighted in mult^Kcr 72 by Gt, where Gjfc is 

DETASLED DESCRIPTION OF THE a frequency-dependent scalar that estimates the echo chan- 

reEFERRED EMBODIMENTS nd magnitude response fat that frequency. The output of 

multiplier 72, G;tlX(k)L is provided to summer 74 where it 

Referring to HG. 2, in the preferred embodiment the ^ subtracted from the noise-suppressed received magnitude 

frequency representations of x(n) and r(n) are obtained using 15 spectrum iS(k)l. This diffa-ence is compared to the product 

the Fast Fourier Transform(FFT) ; a fast implementation of of constant C (C<1) and lN(k)l, with the maximum of the two 

the Discrete Fourier TVansform^DFT) the implementation of chosen to form magnitude spectrum IT(k)l in selection 

which is well known m the art X(k) and R(k) are sets of element 76. By using C»IN(k)l as a lower bound, it is 

frequency components of x(n) and r(n) respectively, where ensured that each frequency component has a positive 

the lengdis and frequency spadngs of XOc) and R(k) are 20 contribution toward IT(k)l. The energy of IT(k)l is computed 

determined by the order of the transform. in energy computation element 80 by Parseval's theorem. 

The far-end speech signal x(n) is provided to loudspeaker where N represents the order of the FFT: 
30 and FFT dement 44. The far-end speech signal x(n), is 

broadcast out of loudspeaker 30 into acoustic echo path 32 ^r=4r 

whidi provides echo signal y(n). Noise signal w(n) and ^5 ^ 

near-end speech signal v(n) arc added to echo signal y(n) energy exceeds some predetermined threshold as 

iUustrated in summers 34 and 36 respectively. Again it con^ared with tiie average background noise energy, 

should be noted that summers 34 and 36 and acoustic echo doubletalk is dedared. 

patix 32 are artifacts of the mobUe environment and are jhc coefficients G^ can be computed by several means. If 

presented for illustrative purposes. The sum of echo signal ad^tive filter 46 has converged, they can be estimated by 

y(n) noise signal w(n) and near-end speech signal v(n), is finding the magnitude spectrum of tiie impulse response of 

provided to microphone 38. The output of microphone 38 is ^j^^ adaptive filter. In a noisy situation where the filter has not 

r(n)- converged, these coefiEidents can be approximated by time- 

The far-end speech signal x(n) is provided to FFT element averaging die quotient IS(k)l/IX(k)l for large components of 

44 whidi determines the frequency representation of die x(k) when doubletalkis not declared. That is, for each frame 

far-end speech signal, X(k). The output of microphone 38, of N samples corresponding to a set of N frequency com- 

r(n), is provided to FFT clement 40 whidi determines the ponents XQc), only estimates of Gjt for the largest frequency 

frequency representation of the niicrophone ou^ut, R(k). peaks in IX(k)l are updated and tiie other coeffidents are left 

The frequency rqnesentations arc provided to doubletalk ^ unchanged. This gives a more accurate estimate in the 

detection dement 42 which compares the two signals and presence of noise. The method and ^paratus described in 

determines if doubletalk is present. If doubletalk is deter- the exemplary embodiment f cx- the detection of doubl^alk is 

mined to be present, then doubletalk detection clement 42 equally applicable to the detection of near-end only speech 

provides a control signal to adaptive filter 46 to curtail and far-end only speech conditions, 

adaptation of filter tap values. If doubletalk is determined to xhe previous description of the preferred embodiments is 

be present, then doubletalk detection element 42 also pro- provided to enable any person skilled in the art to make or 

vides a contrd signal to residual echo suppression element use the present invention. The various modifications to these 

50 to curtail its operation. embodiments will be readily apparent to those skiUed in the 

Ad^tive filter 46 estimates the echo signal in accordance art, and the generic prindples defined herein may be applied 

with the far-end speech signal x(n) and the error signal e(n). ^ to other embodiments without die use of the inventive 

The estimated echo signal y(n) is subtracted from the ou^ut faculty. Thus, the present invention is not intended to be 

ofmicrophone 38, i(n), in sununer 48. The output of summer limited to the embodiments shown herein but is to be 

48 is the error signal, e(n), which is provided to residual accorded the widest scope consistent witii the principles and 

echo suppression element 50 where additional echo sup- novd features disdosed herein, 

pression takes place. 55 I dainu 

In FIG. 3, doubletalk detection dement 42 is shown in 1. An apparatus for detecting dwibletalk conq)risiiig: 
furtiicr detail Doubletalk detection is perf<Hrmed in die a first transform element having an input for recdving a 
frequency domain. The respective spectral con^nenfe X(k) far-end signal and having an output; 
and R(k) are converted into polar form by polar conversion a second transform element having an input for recdving 
dements 70 and 92 respectivdy to obtain tiielr respective ^ a near-^nd signal and having an output, the near-end 
magnitude components IX(k)l and lR(k)l. The recdved car signal induding an uncancelled echo con^nent; 
noise is su|^essed in ndse suppression dement 82 to a detector having a first input coupled to said first trans- 
prevent spurious noise frequency peaks from being inter- f onn dement ou^ut and a second input coupled to said 
preted as doubletalk. second transform dement output for detecting a 

In noise suppression dement 82, the ndse is suppressed 65 doubletalk condition in accordance with a signal pro- 

by low-pass averaging of the noise spectrum in noise vided by said fint transform element and a signal 

spectrum averaging dement 90 during periods of silence. provided by said second transform dement; and 
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IS 



an adaptive filter coupled to the detector, the adaptive 
filter configured for adapting filter tap values, all adapt- 
ing of filter tap values being prevented when the 
detector detects a doubletalk condition. 

2. An echo canceller conqxrising: s 
first transform means for receiving a far-end audio signal 

and transfonning said far-end audio signal to a fre- 
quency representation of said far-end audio signal in 
accordance with a predetennined transform foimat; 

second transfonn means for receiving a near-end audio 
signal including an uncancelled echo component and 
transforming said near-end audio signal including the 
uncancelled echo component to a frequency represen- 
tation of said near-end audio signal in accordance with 
a predetermined second transform f onnat; 

detection means for receiving a first signal representative 
of said frequency representation of said far-end audio 
signal and a second signal representative of said fire- 
quency representation of said near-end audio signal and 
for comparing said first and second signals with each 
other and selectively providing a doubletalk signal in 
accordance with said comparison; 

adaptive filter means for receiving said far-end audio 
signal and said doubletalk signal, for generating an 25 
estimated echo signal in accordance with said far-end 
audio signal and a set of adaptive filter parameters, and 
for ad^ting said set of adaptive filter parameters only 
when said doubletalk signal is absent; and 

echo removal means for receiving said near-end audio 30 
signal and said estimated echo signal and subtracting 
said estimated echo signal from said near-end audio 
signal. 

3. The apparatus of claim 2 further comprising a residual 
echo suppression means for receiving an echo residual 35 
signal and suppressing remaining echo in said echo residual 
signal in accordance with an echo suppression format 

4. An £^aratus for detecting doubletalk, conq>rising: 
first transfonn means for receiving a far-end audio signal 

and for transfonmng said far-end audio signal to a ^ 
far-end frequency representation of said far-end audio 
signal in acccrdanoc with a predetermined first trans- 
form format; 

second transform means for receiving a near-end audio 
signal including an unremoved echo coiiq>onent and a 
noise component and for transforming said near-end 
audio signal to a near-end frequency representation of 
said near-end audio signal in accordance with a prede- 
termined second transform format; 

noise suppression means for receiving said near-end fre- 
quency representation and for generating a noise- 
suppressed near-end frequency representation in accor- 
dance with a predetermined noise suppression format; 

^ 55 

detection means for receiving said far-end frequency 
representation and said noise-suppressed near-end fre- 
quency representation and for generating a signal 
indicative of a doubletalk condition in accordance with 
said far-end frequency reprcsenMtion and said noise- ^ 
suppressed near-end frequency representation. 

5. The ^>paratu$ of claim 4 wherein said detection means 
comprises: 



43 



50 



subtraction means for subtracting said far-end frequency 
rq)resentation from said noise-suppressed near-end fre- 
quency representation to provide a difference signal; 

energy conqHitation means for determining an energy 
value of said difference signal in accordance with a 
predetermined energy computation format; and 

oompazison means for con^aring said difference signal 
energy value with the predetermined threshold value 
and for selectivety providing a signal indicative of a 
doubletalk condition in accordance with said conq)ari- 
son. 

6. The apparatus of claim 5 wherein said far-end fre- 
quency representation comprises frequency components, 
and wherein said detection means fiirther con^nises weight- 
iag means for weighting said frequency components of said 
far-end frequency representation. 

7. The apparatus of claim 5 wherein said noise suppres- 
sion means generates said noise^suppressed near-end fre- 
quency representation by generating a noise spectrum esti- 
mate of said noise component and subtracting said noise 
spectrum estimate from said near-end frequency represen- 
tation. 

8. A method for detecting the existence of a doubletalk 
condition wherein said doubletalk condition exists when 
both near-end and far-end audio signals are present, said 
near-end audio signal including an unremoved echo com- 
ponent and a noise con^nent, oompiising the steps of: 

transforming said far-end audio signal to a frequency 
representation of said far-end audio signal in accor- 
dance with a predetennined first transform format; 

transforming said near-end audio signal to a frequency 
representation of said near-end audio signal in accor- 
dance with a predetermined second transform format; 

suppressing said noise component of said near-end fre- 
quency representation in accordance with a predeter- 
mined noise suppression format to generate a noise 
suppressed frequency format; and 

determining the presence of said doubletalk condition in 
accordance with said far-end frequency representation 
and said noise suppressed near-end frequency repre- 
sentation. 

9. The method of daim 8 wherein said step of determining 
comprises the steps of: 

subtracting said far-end frequency representation from 
said noise suppressed near-end frequency representa- 
tion to provide a difference signal; 

determining an energy value of said difference signal in 
accordance with a predetennined energy computation 
fomat; and 

coni^aring said difference signal energy value with a 
predetermined threshold value to selectively provide a 
signal indicative of said doubletalk condition. 

10. The method of daim 9 wherein said far-end frequency 
representation conq>rises frequency components, further 
comprising the step of weighting said frequency conqx)- 
nents. 

IL The method of daim 8 wherein said step of suppress- 
ing comprises the steps of: 
generating a noise spectrum estimate; and 
Subtracting said noise spe<^fum csiimaie from said near- 
end frequency representation. 
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